Everything about Redundancy Information Theory totally explained
Redundancy in
information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it's the amount of wasted "space" used to transmit certain data.
Data compression is a way to reduce or eliminate unwanted redundancy, while
checksums are a way of adding desired redundancy for purposes of
error detection when communicating over a noisy channel of limited
capacity.
Quantitative definition
In describing the redundancy of raw data, recall that the
rate of a source of information is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the most general case of a stochastic process, it is
»
the limit, as
n goes to infinity, of the
joint entropy of the first
n symbols divided by
n. It is common in information theory to speak of the "rate" or "
entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a memoryless source is simply
, since by definition there's no interdependence of the successive messages of a memoryless source.
The
absolute rate of a language or source is simply
»
the
logarithm of the
cardinality of the message space, or alphabet. (This formula is sometimes called the
Hartley function.) This is the maximum possible rate of information that can be transmitted with that alphabet. (The logarithm should be taken to a base appropriate for the unit of measurement in use.) The absolute rate is equal to the actual rate if the source is memoryless and has a
uniform distribution.
The
absolute redundancy can then be defined as
»
the difference between the absolute rate and the rate.
The quantity
is called the
relative redundancy and gives the maximum possible
data compression ratio, when expressed as the percentage by which a file size can be decreased. (When expressed as a ratio of original file size to compressed file size, the quantity
gives the maximum compression ratio that can be achieved.) Complementary to the concept of relative redundancy is
efficiency, defined as
so that
. A memoryless source with a uniform distribution has zero redundancy (and thus 100% efficiency), and can't be compressed.
Other notions of redundancy
A measure of
redundancy between two variables is the
mutual information or a normalized variant. A measure of redundancy among many variables is given by the
total correlation.
Redundancy of compressed data refers to the difference between the
expected compressed data length of
messages
(or expected data rate
) and the entropy
(or entropy rate
). (Here we assume the data is
ergodic and
stationary, for example, a memoryless source.) Although the rate difference
can be arbitrarily small as
increased, the actual difference
, cannot, although it can be theoretically upper-bounded by 1 in the case of finite-entropy memoryless sources.
Further Information
Get more info on 'Redundancy Information Theory'.
|
External Link Exchanges
Do you know how hard it is to get a link from a large encyclopaedia? Well we're different and will prove it. To get a link from us just add the following HTML to your site on a relevant page:
<a href="http://redundancy__information_theory.totallyexplained.com">Redundancy (information theory) Totally Explained</a>
Then simply click through this link from your web page. Our crawlers will verify your link, extract the title of your web page and instantly add a link back to it. If you like you can remove the words Totally Explained and embed the link in article text.
As long as your link remains in place, we'll keep our link to you right here. Please play fair - our crawlers are watching. Your site must be closely related to this one's topic. Any kind of spamming, dubious practises or removing the link will result in your link from us being dropped and, potentially, your whole site being banned. |